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Abstract 

Density Functional Theory (DFT) is one of the most used ab initio theoretical frameworks in materials science. It 
derives the ground state properties of a multi-atomic ensemble directly from the computation of its one-particle density 
n(r). In DFT-based simulations the solution is calculated through a chain of successive self-consistent cycles; in each 
cycle a series of coupled equations (Kohn-Sham) translates to a large number of generalized eigenvalue problems whose 
eigenpairs are the principal means for expressing n(r). A simulation ends when n(r) has converged to the solution 
within the required numerical accuracy. This usually happens after several cycles, resulting in a process calling for the 
solution of many sequences of eigenproblems. In this paper, the authors report evidence showing unexpected correlations 
between adjacent eigenproblems within each sequence. By investigating the numerical properties of the sequences of 
generalized eigenproblems it is shown that the eigenvectors undergo an "evolution" process. At the same time it is 
shown that the Hamiltonian matrices exhibit a similar evolution and manifest a specific pattern in the information they 
carry. Correlation between eigenproblems within a sequence is of capital importance: information extracted from the 
simulation at one step of the sequence could be used to compute the solution at the next step. Although they are not 
explored in this work, the implications could be manifold: from increasing the performance of material simulations, to 
the development of an improved iterative solver, to modifying the mathematical foundations of the DFT computational 
paradigm in use, thus opening the way to the investigation of new materials. 
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Density Functional Theory [212] is a very effective theoret- 
ical framework for studying complex quantum mechanical 
problems in solid and liquid systems. DFT-based methods 
are growing as the standard tools for simulating new mate- 
rials. Simulations aim at recovering and predicting phys- 
ical properties (electronic structure, total energy differ- 
ences, magnetic properties, etc.) of large molecules as well 
as systems made of many hundreds of atoms. DFT reaches 
this result by solving self-consistently a rather complex set 
of quantum mechanical equations leading to the computa- 
tion of the one-particle density n(r) , from which physical 
properties are derived. 

In order to preserve self-consistency, numerical imple- 
mentations of DFT methods consist of a series of iterative 
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cycles; at the end of each cycle a new density is computed 
and compared to the one calculated in the previous cy- 
cle. The end result is a series of successive densities con- 
verging to an n(r) approximating the exact density within 
the desired level of accuracy [3]. Each cycle consists of 
a complex series of operations: calculation of Kohn-Sham 
potential, basis functions generation, numerical integra- 
tion, generalized eigenproblems solution, and ground state 
energy computation. In one particular DFT implemen- 
tation, namely the Full-potential Linearized Augmented 
Plane Wave (FLAPW) method [H [5] , matrix entry initial- 
ization and generalized eigenvalue problem solution are the 
most time consuming stages in each iterative cycle (Fig. [I]). 

The cost for completing these stages is directly related 
to the number of generalized eigenproblems involved and 
to their size. Above a certain threshold, the eigenprob- 
lem size is proportional to the third power of the num- 
ber of atoms of the physical system, while the number 
of eigenproblems ranges from a few to several hundred 
per cycle. Typically, each of the problems is dense and 
a significant fraction of the spectrum is required in or- 
der to compute n(r). The nature of these eigenproblems 
forces the use of direct methods, such as those included in 
LAPACK or ScaLAPACK [2U1 [ST]- All of the most com- 
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mon simulation codes implementing the FLAPW method 
(WIEN2k, FLEUR, FLAIR, Exciting, ELK [MB), de- 
spite successfully simulating complex materials [T2TH5] . 
treat each eigenproblem of the series of iterative cycles 
in isolation. This implies that no information embedded 
in the solution of eigenproblems in one cycle is used to 
speed up the computation of problems in the next. While 
these routines provide users with accurate algorithms to 
be used as black-boxes, they do not offer a mechanism for 
exploiting extra information relative to the application. 

The line of research pursued here takes inspiration from 
the necessity of exploring a different computational ap- 
proach in an attempt to develop a high performance algo- 
rithm specifically studied for the FLAPW method. Con- 
trary to the traditional view, we look at the entire succes- 
sion of iterative cycles making up a simulation as consti- 
tuted by a few dozen sequences of generalized eigenprob- 
lems. By mathematical construction, each problem in a se- 
quence is expected to be, at most, weakly connected to the 
previous one. At odds with this observation, we present 
evidence showing that there is an unexpectedly strong cor- 
relation between eigenproblems of adjacent cycles in each 
sequence. We suggest how this extra information should 
be used to improve the performance of the current state- 
of-the-art routines. 

Recently some methods have been developed that go 
in this direction. Among them we mention the block ver- 
sion of the Krylov-Schur [16] (in itself an improved version 
of the Thick- Restart Lanczos [T7]) and the Chebyshev- 
Davidson [TB] methods. One of the most successful exam- 
ples in this sense is the recently implemented Chebyshev- 
Filtered Subspace Acceleration [19] currently included in 
the PARSEC package specifically targeting ab initio real- 
space computations. 

In Sec. [2] we introduce the reader to the DFT frame- 
work and more specifically to the FLAPW method. We 
explain in more detail the series of self-consistent cycles, 
the computational bottlenecks inside each cycle, and how, 
from this picture, the importance of sequences of eigen- 
problems emerges. In Sec. [3j we illustrate the investigative 
tools we employ in studying the eigen-sequences, namely 
eigenvector evolution and unchanging matrix patterns. We 
then present our computational results and extensively dis- 
cuss their interpretation from the numerical and physical 
point of view. Finally in Sec. [4] we draw our conclusions 
and explain how it would be possible to exploit the ex- 
perimental results to improve the performance of a DFT 
simulation. 



the one-particle electron density n(r). In turn, the wave 
functions 4>i(r), which solve the Schrodinger-like equations 
for N electrons, compute the one-particle electron density 
[eq. ([2|] used in determining the effective potential. The 
latter is explicitly written in terms of the nuclei atomic 
Coulomb potential vj(r), a Hartree term w(r,r') describ- 
ing repulsions between pairs of electrons, and the exchange 
correlation potential u X cM( r ) summarizing all other col- 
lective contributions [eq. ^3|]. This set of equations, also 
known as Kohn-Sham (KS) [2 , is solved self-consistently. 
In other words the equations must be solved subject to the 
condition that the effective potential v [n] and the electron 
density n(r) mutually agree. 
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v (r)=vi(r) + J w(r,r')n(r')dr' + v xc [n](r) (3) 

Computational implementations of DFT depend on the 
particular modeling of the effective potential and on the 
orbital basis used to parametrize the eigenfunctions (f>i(r). 
In the context of periodic solids, the vector k and band v 
indices replace the generic index i\ the Bloch vector k is an 
element of a three-dimensional Brillouin zone discretized 
over a finite set of values, called the set of k-points. In 
the FLAPW method [2 [5], the orbital function 0k >w (r) 
are expanded in terms of a function basis set ^c(k, r) 
indexed by vectors G lying in the lattice reciprocal to the 
configuration space 



Mr) = E 

|G+k|<K„ 



cJLV>G(k,r). 
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In FLAPW, the configuration (physical) space of the quan- 
tum sample is divided into spherical regions - called Muffin- 
Tin (MT) spheres - centered around atomic nuclei, and 
interstitial areas between the MT spheres. Within the vol- 
ume of the solid's unit cell f2, the basis set ^c(k,r) takes 
a different expression depending on the region 
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2. Physical framework 

DFT methods are based on the simultaneous solution of 
a set of Schrodinger-like equations [eq. 0] . These equa- 
tions are determined by a Hamiltonian operator H that, 
in addition to a kinetic energy operator, contains an ef- 
fective potential vq [n] , which functionally depends only on 



For each atom a, the coefficents a^ G (k) and 6^ n G (k) 
are determined by imposing continuity of the wavefunc- 
tions (f>u.,v( r ) and their derivatives at the boundary of the 
MT sphere. The Yi m (r a ) are spherical harmonics of the di- 
atom, r a = ^r- is a unit vector, and r a is the distance from 
the MT center. The radial functions uf(r) and their time 
derivatives iif (r) are obtained by a simplified Schrodinger 
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Figure 1: A schematic rendering of the principal stages of the self-consistent cycle. Colors indicate the breakdown of the computing 
time of each stage of the cycle. Each "time usage" is based on an average for simulations of very diverse physical systems. Each 
simulation was run on a single core using a sequential version of the FLEUR code. 



equation, written for a given energy level Ei, containing 
only the spherical part of the effective potential v$ ph (r) 

f h 2 8 2 h 2 1(1 + 1) , , 1 „. . 

(5) 

Thanks to this expansion, the KS equations naturally 
translate to a set of generalized eigenvalue problems 

£ t A GG< (k) - A k „B GG , (k)] cZ = (6) 

G' 

where the coefficients of the expansion c{^, are the eigen- 
vectors, while the Hamiltonian and overlap matrices A and 
B are given by volume integrals and a sum over all MT 
spheres 

{A(k),B(k)} = £ I ^ G (k,r){ff,i}^ G ,(k,r). (7) 



In practical numerical computations, a solution is reached 
by setting up a multi-stage cycle (Fig. [I]) . An initial edu- 
cated guess for n(r) is used to calculate the effective full- 
potential vo[n] using the Pseudo-Charge [5] in combination 
with Fast Fourier Transform (FFT) methods. The poten- 
tial, in turn, is inserted into the simplified Schrodingcr 
equation ^ whose solutions, together with the coefficents 
a^ G (k) and 6^ n G (k), lead to the basis functions ^ G (k, r). 
The latter are used to calculate the entries of the matrices 
A(k) and £>(k), an operation that requires the computa- 
tion of three-dimensional spherical integrals [eq. Q] for 
the non-spherical part of the potential appearing in H . 

In the next stage the matrices just computed are the 
input in dozens to hundreds of generalized eigenvalue prob- 
lems [eq. ([6])] that are solved simultaneously. Each eigen- 
problem is of the form Ax = XBx, where both A and 
B are dense hermitian matrices, B is additionally posi- 



tive definite, and x and A form a sought-after eigenpair. 
In FLAPW-related applications, usually only a fraction of 
the lower part of the spectrum is computed and retained 
based on the Fermi energy value. The stored eigenpairs are 
then used to evaluate the ground state energy of the phys- 
ical system, a step which is followed by the computation 
of a new charge density n'(r). 

At the end of the cycle, convergence is checked by com- 
paring n'(r) with n(r). If |n'(r) — n(r)| > rj, where T) is the 
required accuracy, a suitable mixing of the two densities 
is selected as a new guess, and the cycle is repeated. This 
process is properly referred to as an outer-iteration of the 
DFT self-consistent cycle. Convergence is guaranteed by 
the Hohenberg-Kohn theorem [3] stating that there exists 
a unique electron density no(r) locally minimizing an en- 
ergy functional E[n] closely related with the Hamiltonian 
operator H. 

In sum, the FLAPW self-consistent scheme is formed 
by a series of outer-iterations, each one containing multi- 
ple large generalized eigenproblems. In order to numeri- 
cally compute the charge density n(r) at each iteration, 
the matrices A and B need to be initialized for each It- 
point and the generalized eigenproblem Ax — XBx solved. 
These two stages are the most machine-time consuming 
part of the cycle, each accounting for between 40% and 
48% of the total computational time (Fig. [I]). Moreover, 
the more complex the material, the larger the matrices and 
the slower the convergence, resulting in an increase in the 
number of outer-iterations. 

3. An alternative viewpoint 

The results presented in this paper originate from the de- 
liberate choice of studying the DFT self-consistent cycle 
from a different perspective. The entire outer-iterative 
process is regarded as a set of sequences of eigenproblems 
Pi. This interpretation is based on the observation that, 
for each k-point, the solution of a problem at a certain 
iteration Pi (k) is a prerequisite for setting up the next one 

Considering the single eigenproblems Pi (the k index 
is suppressed for the sake of simplicity) to be part of a se- 
quence {Pi} can have far-reaching consequences: it might 
help to unravel correlations among them and ultimately 
lead to the conception of an entirely different computa- 
tional approach to solving them as the simulation pro- 
gresses. Since DFT is one of the most important ab initio 
electronic structure frameworks, the study of a compu- 
tational procedure that would lead to high-performance 
solutions to {Pi} is of crucial importance. 

In order to study the evolution of the generalized eigen- 
problems as part of the sequence {Pi}, we focus our atten- 
tion on the transformation of eigenvectors and the vari- 
ation of the matrix entries of the Hamiltonian matrix A 
(the same could be done for the overlap matrix B). For a 
fixed k, each eigenvector at iteration i is compared with 



its corresponding eigenvector at iteration i + 1. A similar 
comparison is performed between the values of the entries 
of adjacent Hamiltonian matrices. Despite the apparent 
simplicity of the strategy, the realization of a comparative 
tool that quantitatively describes the evolution of {Pi} is 
not a trivial matter. 

3.1. Eigenvector evolution 

In this section we focus on a generic sequence and describe 
a procedure to study the evolution of the eigenvectors solv- 
ing for Pi. The results obtained are independent of k since 
each such point represent a vector in the Brillouin zone for 
which there is an independent sequence of eigenproblems. 
As a consequence our analysis can be applied to any se- 
quence in the simulation. In order to carry out our plan, 
we need an associative criterion that allows comparison 
between eigenvectors of successive iterations. This is not 
a simple task since the ordering of a set of eigenpairs can 
change substantially from one iteration to the next. 

For instance, one could arrange the eigenvectors by the 
increasing magnitude of their respective eingenvalues and 



compare two eigenvectors, say and x 



„(*+!) 



with the 



same eigenvalue index I. This naive comparison is bound 
to fail due to the fact that eigenvalues close in magni- 
tude often swap positions across iterations. Consequently, 
identifying eigenvectors becomes rather difficult as the se- 
quence advances. The nature of the self-consistent process 
interferes with the ability to find a one-to-one correspon- 
dence between vectors of neighboring iterations. 

3.1.1. Computational scheme 

For a correct comparison between adjacent eigenvectors we 
developed an algorithm that establishes a one-to-one corre- 
spondence based on two observations: 1) a DFT simulation 
is basically a minimization procedure, and as such favors 
small eigenpair variations in its progress towards conver- 
gence, and 2) all eigenpairs contribute more or less "demo- 
cratically" to the progression of the sequence. Although 
not fully mathematically rigorous, these statements find 
their a posteriori justification in the formal comparison 
between orbitals of two successive iteration cycles (see Ap- 
pendix [A]), and translate directly into two specific behav- 
iors of the eigensolutions. First, scalar products between 

(i) 

an eigenvector Xj at iteration i and any of the eigenvec- 
tors a;? at iteration i + 1 have a gaussian distribution 

ft) 

narrowly peaked at around one value cr) ■ Second, the 

set of largest scalar products, {oj }, has a flat and almost 
constant distribution. In mathematical terms, they can be 
written as 
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These observations motivated the design of a routine 
that, without claiming to be unique or optimized, succeed 
to correctly relate two successive eigenvectors. Specifi- 

(i) 

cally, it identifies, for each eigenvector Xj, the largest 
scalar product cr- subject to the condition that the (i+1)- 
iteration index £, associated with j, is not associated with 
any other j' ^ j. By design, this procedure establishes 
a one-to-one correspondence between the eigenvectors of 



successive iterations, x 



whose information is 



stored in a permutation operator II 
V j, 3! I; j=Il(I) and V £, 31 j 



n 



(io) 

Using II, the column positions in the matrix of scalar prod- 



ucts 



(x {l) x {l+1) ) 



can be rearranged so as to easily obtain 

the largest scalar products {cj*'} from the main diagonal. 
From this matrix we can easily extract the subspace de- 
viation angles, automatically normalized to one, between 
corresponding eigenvectors of adjacent iterations 



3 « 



diag 



.00 Ji+^y 



(11) 



These angles provide the means for studying the evolution 
of the eigenvectors of the sequence of generalized eigen- 
problems {Pi}. 

Collecting all the angles computed in one simulation 
results in a large set of data (there are N = dim(A) angles 
for each iteration and each k). For our statistical analysis, 
we manipulate the angles so as to plot them in three dif- 
ferent ways depending on which parameter characterizing 
the data is kept fixed. First, fixing the iteration index and 
a specific eigenvalue, we look at how the angles are dis- 
tributed among the ks. Then, we choose a random k and 
look at how all the deviation angles vary as the sequence 
progresses. Finally we select an eigenvalue and examine 
the evolution of the angles for all k as the iteration index 
increases. 

In order to perform the entire computational process, 
from eigenvector pairing to deviation angle plotting, we 
built a Matlab analysis toolkit. The input is the set of 
matrices A and B of all the eigenproblems appearing in 
the sequences of a simulation. Simulations of the phys- 
ical systems analyzed were performed using the FLEUR 
code [8] running on JUROPA, a powerful cluster-based 
computer operating in the Supercomputing Center of the 
Forschungszentrum Jiilich. For each physical system stud- 
ied we produced outputs for a consistent range of param- 
eters. 

Table 1: Simulation data 
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3.1.2. Experimental evidence 

We present here a numerical study for two typical physical 
systems. The first one is a 5 layer film of iron, denoted by 
Fe$i, with a (100) surface orientation modeled by a sim- 
ple tetragonal lattice containing 5 atoms in the unit cell 
embedded in two semi-infinite vacua. The second exam- 
ple, zinc oxide, is an ionic bonded material arranged on a 
wurtzite lattice - a multilayered hexagonal lattice with 2 
Zn and 2 O atoms per unit cell. For each material we ran 
a simulation whose specifics are described in Table [T] 
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Figure 2: Histograms showing a qualitative distribution of the 
deviation angles of each k-point eigenvector corresponding to 
the four lowest eigenvalues at a fixed iteration. All angles are 
normalized to one (1.00 = ir/2). Above: the Fe$e case plotted 
at the 3 r iteration. Below: the zinc oxide case plotted at the 



For each case study we plot a set of histograms showing, 
for all k-points, the distribution of angle deviations for the 
four smallest eigenvalues at a specifically chosen iteration 
(Fig. [2]). In each histogram the distribution is sharply 
peaked at the lowest end of the interval and has null or only 
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negligible tails. This result supports our analysis as to the 
"democratic" contribution of all angles to the progression 
of the sequence [eq. (J9J)]. 
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Figure 3: Eigenvector angles of successive iterations for the iron 
multi-layer: evolution of angles for all eigenvectors of the se- 
quence corresponding to "k-point 11 (above) and the evolution of 
angles for all 15 k-points of eigenvectors corresponding to the 
7* lowest eigenvalue (below). All the angles are normalized to 
one (1.00 = ir/2). 

For both physical systems we also show two diagrams, 
one that plots angles for all eigenvectors at one specific 
k- value and the other that presents the angles of the j th 
eigenvector at each k-point (Fig. 3]4). Both graphs are 
plotted against the iteration index on a semi-log scale to 
better display the evolution of the deviation angles [j] . 
We can immediately notice the almost monotonic decrease 
of the deviation angles as the sequences progress towards 
convergence. Small upward oscillations are probably due 
to an excess of localized charge that may cause a partial 
restart of the sequence. We have also observed that the 
angles corresponding to the lowest 20% of the spectrum 
are, on average, higher than the rest. 




Iterations (2 -> 9) 

Evolution of subspace angle for eigenvector 7 and all 40 k-points 
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1 In all graph labels the abbreviation "eigs" stands for the word 
eigenvalues while the symbol "b/w" stands for the word between. 
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Figure 4: Eigenvectors angles of successive iterations for zinc 
oxide: evolution of angles for all eigenvectors of the sequence 
corresponding to k-point 21 (above) and the evolution of an- 
gles for all 40 k-points of eigenvectors corresponding to the 7 th 
lowest eigenvalue (below). All the angles are normalized to one 

(1.00 = 71-/2;. 



In all other multi-atomic systems studied, besides the 
ones shown here, the great majority of angles after the 
3 rd or A th iteration are very small. Contrary to intuition 
the simulation is far from converged at this stage, imply- 
ing again a sort of "democracy" of contribution, where all 
eigenvectors positively influence the process of minimizing 
the energy functional that depends on n(r). This behavior 
has a universal character since we observed it in the bulk, 
layer, metallic, and ionic materials we analyzed. 

In order to give a more quantitative flavor of the eigen- 
vector evolution we have tabulated the mean angle value 
#W for an iteration at the beginning and one at the end of 
the simulation (Table [2]). Due to their physical relevance, 
we used only deviation angles of those eigenvectors whose 
eigenvalues represent energies below Fermi level. As can 
readily be seen, the mean values at the end of the simula- 
tion are considerably smaller than those at the beginning, 
in this way confirming the qualitative picture described 
above. 
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Table 2: Deviation Angle Means 
ff of relevant eigs 
Material (% of spectrum) #( start ) 9 (cDd ^ 

Fe^ 44 (11.0 %) 1.10 1(T 2 3.64 10~ 9 

ZnO 27 (5.6 %) 7.29 10~ 2 0.61 10~ 5 

3.2. Matrix entry variation 

We systematically look at the variations in the entries in 
adjacent A matrices for two main reasons. First, we plan to 
develop a method that identifies those portions of entries 
of A that undergo little or no change at all. Eventually 
this method could be used to avoid recalculating those en- 
tries at each iteration, and in doing so saving computing 
time. Second, we believe that the connection between suc- 
cessive eigenvectors should somehow surface in how much 
the matrices defining the eigenproblems vary across itera- 
tions: both are the indirect consequence of changes in the 
set of basis wavefunctions ?/>G(k,r). 

3.2.1. Computational scheme 

Despite its manifest simplicity, comparison between ma- 
trix entries across adjacent iterations can be rather non- 
trivial. In fact, variations of the single entries of A-jp with 
^•k span a range of several orders of magnitude and 
need to be opportunely rescaled. Our initial strategy is 
to normalize all variations so as to map them onto a [0,1] 
interval. Subsequently we introduce a threshold param- 
eter p t that cuts off all variations below a certain value. 
This strategy helps in identifying those areas of the matri- 
ces where the entries undergo relatively large variations; it 
also allows us to study the percentage of entries that varies 
as a function of the cut-off value. Eventually, one can de- 
termine the value of the threshold that might be chosen 
for saving computing time while only minimally compro- 
mising the accuracy of the eigensolutions (i.e. speed vs 
accuracy) . 

First we had to establish the most appropriate metric 
to gauge the relative size of entry variation. The choice of 
the metric influences the mapping of the variations onto 
the specified interval. In this study we chose the maximal 
entry variation for each matrix difference 6^ = max(|y4^ +1 
| ) and normalized each entry of the difference with re- 
spect to it. The entries of the resulting matrix A^ = 

k s(i) — — are clearly mapped onto the [0, 1] interval. 
Then the threshold is measured as a fraction of Si, given 
by the cut-off value p t , being a number G [0,1] (e.g. a 
p t = 0.1 value means we are considering all those entries 
of A^ that are larger or equal to 10 % of It has to be 
noted that, contrary to common intuition, the lower the 
cut-off value pt, the greater the number of non-zero entries 
of A is. We complement this analysis with a more conven- 
tional approach where the median of the entries of A^' is 



compared with the median of the entries of the difference 
\A^ — A^~^\ as the sequence progresses. 

All A^ , extracted from the simulation of a given phys- 
ical system, were analyzed, at a fixed k, for different values 
of the cut-off and for different iteration levels i. As for the 
eigenvector evolution, the input for our analysis is the set 
of matrices A that defines the eigenproblems appearing 
in sequences of a simulation. All the simulations of the 
physical systems were performed using the FLEUR code 
running on JUROPA. 



IA (4) - A (3) l /b (3) >0.10- Iteration 4 




500 1000 1500 2000 2500 



/ A <26) - A <25) l/b (25) > 0.15- Iteration 26 




Figure 5: Visualization of for k = 1 at two different itera- 
tion cycles, excerpted from a CaFe?,As2 simulation. Above: plot 
of all the entries above the cut-off value p t = 0.10 at iteration 
i = 4. Below: plot of all the entries above the cut-off value 
Pt = 0.15 at iteration i = 26. 

3.2.2. Experimental evidence 

As an example of our approach we present the analysis of 
a simulation of a superconducting compound, denoted by 
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CaFe2As 2 , that undergoes a first order phase transition 
from a high temperature, tetragonal phase to a low tem- 
perature orthorhombic phase. The specific characteristics 
of this simulation are listed in Table [T] 

In Fig. H we first give a qualitative picture of the por- 
tion of A that changes, for a specific k-point at two distant 
iterations and different cut-off values. Two distinct obser- 
vations can be made: on the one hand, the empty portions 
of A tend to preserve their shape and position as the cut- 
off increases. In other words those parts of A that do not 
vary much seem to follow a specific pattern independent 
from p t . On the other hand, the percentage of entries 
that undergo variation does not seem to be affected by the 
progress of the sequence of iterations. 

Percentage of matrix entries variation vs cut off value (k=1,i=16) 

100% f , , , , , = 




1 0.8 0.6 0.4 0.2 0.1 

Cut-off value 

Figure 6: Percentage of varying matrix entries plotted versus 
cut-off values on a semilogarithmic scale for the CaFe2Asi sys- 
tem for k = 1 and i = 16. 



mum value of \A-£' — A^ ^ | with the median of the entries 

of A£ . As the simulation progresses the median value of 
the Hamiltonian matrix remains approximately constant 
(green line) while the maximum and the median value of 
the matrix variation decrease. Moreover the median value 
of lA^-A^l is, on average, 1-2 orders of magnitude 
lower than as can also be seen from Table [3] 

Can the large number of unchanging entries be used to 
speed up computations at every DFT iteration? In order 
to answer this question it needs to be understood how 
the trade off between speed and accuracy depends on the 
choice of cut-off value in relation to the iteration index 
i. On one hand the patterns and distribution of entries 
that undergo very small variation could be exploited so as 
to avoid computing them anew in each iteration. On the 
other hand the evolution of entry variation suggests that 
updating some of the entries could be completely avoided 
after a certain iteration. 



Comparing matrix entries across iterations 

10° | , , , , , 




The fact that such qualitative behavior is evident in all 
of the physical systems investigated suggests that it is a 
"universal" trait characteristic of DFT-based simulations. 
This conclusion implies that despite the fact that the basis 
functions set ij)c (k, r) changes substantially between suc- 
cessive iterations, it would seem that for certain subsets of 
basis functions such a change contributes very little to the 
volume integrals in eq.|7|. 

In Fig. [6] we give a more quantitative description, at a 
fixed iteration, of the number of matrix entries that change 
as the threshold value increases. It can be noted that the 
percentage of varied entries becomes significant only for 
Pt < 0.1. This behavior indicates that overall there are 
very few entries undergoing major changes; most of the 
variations are concentrated in the low end of the metric 
(i.e. pt x 5^ and pt < 0.1). Contrary to what is suggested 
by the eigenvector evolution, the percentage of matrix en- 
try variation (at a fixed cut-off) does not seem to decrease 
as the sequence of eigenproblems progresses (see red line in 
Fig. [7]). This is quite a surprising result signaling that un- 
changing patterns and small eigenvector deviation angles 
in {Pi} may have different origins. 

In Fig. [7] we have also compared the median and maxi- 



Figure 7: Median of the matrix entries of A-l and median and 
maximum of matrix entries of\A^ — A^ 1 '! plotted versus the 
iteration index for the CaFe2As2 system for k = 1. 

Our study makes manifest that there is room for devel- 
oping new methods for saving computational time in the 
process of updating the matrices defining the generalized 
eigenproblems in {Pi}. The possible methods and their 
optimization is still the object of current research and we 
refer the reader to future publications for more conclusive 
results. 



Table 3: Matrix Entry Medians 
Iteration Median Median Maximum 

index (i) of A? of of \AP - A^ 
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4. Correlation and exploitation 

In the previous section, we analyzed DFT-based simu- 
lations from a non-conventional perspective. Departing 
from the traditional picture that considers eigenproblems 
in each iterative cycle of a simulation as independent, we 
instead assumed that they form a set of sequences of eigen- 
problems {Pi(k)}. We then provided experimental evi- 
dence pointing out a connection between problems that 
have consecutive iteration indices. In particular, we un- 
covered a strong correlation between eigenvectors of suc- 
cessive problems as well as the existence of unchanging 
patterns in the matrices defining the eigenproblems. We 
illustrated how this correlation is noticeably linked to the 
convergence process of the simulation: as the iteration in- 
dex increases, the eigenvector deviation angles become, on 
average, smaller. Unfortunately we did not observe an 
equivalent drop in the variation of the matrix entries of the 
Hamiltonian but only in its maximum and median values. 
Nonetheless it is evident that the sequences of eigenprob- 
lems {Pi} undergo a significant evolution. 

The importance of this result stems from the fact that 
this correlation is quite unexpected. Since each single 
problem P(k) at iteration i + 1 is determined by the orbital 
wave functions obtained by the solution of all the problems 
{P(k)} at iteration i, the connection between P, and Pj+i 
is non-linear and presumably very weak. Until recently it 
was believed that such weak non-linearity would have hid- 
den any sign of correlation, a conviction that inhibited any 
research effort in this direction. The source of non-linearity 
resides in the two indirect ways each Pj is influenced by the 
basis functions ^-(k, r) computed at each new iteration. 
First, matrix entries defining the eigenproblems are given 
by volume integrals involving basis functions [eq. Q] , not 
orbital functions, and second the eigenvectors are the n- 
tuple of coefficients expressing orbital wave functions in 
terms of a linear combination of basis functions [eq. Q]. 
As a consequence of these two considerations, eigenvectors 
were considered to be very loosely connected. The discov- 
ery that the contrary to this fact is true compels us to give 
great relevance to the evidence of a strong correlation. 

While having found a strong correlation between eigen- 
problems of a sequence {Pi} is in itself an important result, 
it is even the more so because it opens the way to the ex- 
ploration of new computational strategies. In particular, 
the performance of the entire DFT simulation could be 
improved by boosting the performance of the sequences 
of eigenproblems. The idea is to take advantage of repeti- 
tive patterns in the eigenpencil (Hamiltonian and Overlap) 
and in the eigenvector evolution. In practice, the key ele- 
ment would be to reuse, for the solution of eigenproblems 
at a certain iteration, numerical quantities computed in a 
previous one. In order to clarify this concept we briefly de- 
scribe, in the following, the implication of just eigenvector 
manipulation. 

Traditionally large dense eigenproblems are solved us- 
ing direct methods whenever the fraction of the sought 



after spectrum is already above the few %-points. In the 
opposite case scenario, iterative methods are mostly used 
for sparse eigenproblems when the fraction of eigensolu- 
tions required is very small. Due to the large number 
of matrix-vector operations iterative eigensolvers do not 
perform well for dense problems and sometimes, in the 
presence of tight clusters, even fail to converge. These 
same considerations do not necessarily apply to sequences 
of dense correlated eigenproblems {Pi} when the fraction 
of eigenpairs sought after is somewhat lower than 10%. In 
this case the reuse of information can somewhat improve 
the performance of selected iterative solvers. 

Far from presenting any conclusive evidence, we point 
out that the evolution of eigenvectors could be exploited 
to improve the performance of iterative solvers and, even- 
tually, make them competitive with direct solvers when 
applied to sequences of eigenproblems. This result will de- 
pend on two distinct key properties of the iterative method 
of choice: 1) the ability to be fed previously computed 
eigenvectors and take advantage of them as a starting 
guess to efficiently compute the new ones, and 2) the ca- 
pacity to solve simultaneously for a substantial portion 
of eigenpairs. The first property would result in filtering 
away as efficiently as possible the unwanted components 
of the old eigenvectors. The second characteristic implies 
a moderate-to-substantial speedup for the matrix-vector 
multiplications as well as an improved convergence. This 
is part of a study that is underway and will be presented 
in a future publication. 

5. Conclusions 

The results described in this paper are an example of how 
it is feasible to study a mathematical model by reverse 
induction. Starting from simulations, we showed how it 
is possible to construct a method to analyze the potential 
improvements of the algorithmic realization of a mathe- 
matical model on which the simulations are based. This 
approach reverses the usual direction that goes from the- 
oretical model to experiment passing through computer- 
based simulations. In other words it is an example of how 
to look at DFT-based computations as an inverse prob- 
lem. As such we would like to refer to our approach as a 
"reverse simulation" method. 

In this work we utilize the "reverse simulation" ap- 
proach as a tool to investigate the properties of the non- 
linear generalized eigenproblem arising from the Kohn- 
Sham equations. In general such an eigenproblem is lin- 
earized through the introduction of a self-consistent cycle 
that is solved until convergence is reached. The non-linear 
problem translates then into a series of eigenproblems that 
are solved in total independence from each other. We show 
that in reality these eigenproblems are strongly correlated 
and constitute part of a sequence. We finally suggest that 
the correlation discovered could be exploited by a class of 
opportunely designed iterative solvers. 
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This result is of great impact for the community of com- 
putational physicists working in the wide field of material 
science. It could be the first step towards changing com- 
putational paradigm, especially in view of the increasing 
trend toward the use of massively parallel architectures. 
By giving "dense" DFT methods, like FLAPW, access to 
the use of iterative solvers one will effectively increase their 
scalability. Conversely a higher scalability would empower 
these ab initio methods with the capability of investigating 
larger physical systems that are currently out of reach. 

Besides affecting DFT methods that until now have 
precluded the use of iterative solvers, the methodological 
viewpoint used here will have by far more important conse- 
quences than just improving the computational approach 
to the simulations: it would allow us to go beyond the 
conventional FLAPW method and create a more efficient 
mathematical paradigm. 
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Appendix 

A. A formal justification of the eigen- 
vector computational scheme 

The formal study of eigenvector evolution is based on two 
statements given at the beginning of section 3.1.1. Of these 
statements the first one simply states the fact that the or- 
bital functions that make up the charge density have to 
already be a very good guess for the exact wavefunctions 
for the self-consistent cycle to converge. This fact trans- 
lates to the following formal analysis. 

Eq. Q written for the orbital functions at a certain 
iteration (i) is 



E 

-k|<K„ 



^(k,r). (A.l) 



If we take the scalar product of two 4>s for different band 
indices but the same k-point and the same iteration we 
straightforwardly obtain 



„*(»)G' (i)G 



*(i)G' c(i) »G 



J dv ^W(k,r)V$(M) 
(A.2) 



G'G u k,^ 



= E 

|G+k|,|G'+k|<K„ 

= E 

|G+k|,|G'+k|<K moa . 
— (Xu^fj,) ^v^i- 

By repeating the same computation for 0s from two suc- 
cessive iterations, we arrive at 



Jdv £W(r) 

E 

|G+k|,|G'+k|<K„ 

E 

|G+k|,|G'+k|<K„ 



*(i)G' (i+l)G 
c k,y L k,/x 



*(i)G'5 



Sg-g c k /i 



dr^«(k,r)^ +1) (k, 
li+1)G (A.3) 



where Sg'G is an asymmetric positive definite matrix and 
as such can be factorized according to Cholesky as 



Sec = 



|G'' 



E 

-k|<K„ 



L. 



G'G 



<L 



G"G 



(A.4) 



with Lg'G" being a lower triangular matrix. 

Let us momentarily assume the correctness of the state- 
ments in section [3.1.1 | The first statement implies that the 



left-hand side of Eq. ( A.3 ) can be expanded 
<W + eiv/i + o(e 2 ) with e being a generic numerical expan- 
sion parameter. Correspondingly, from the second state- 
ment we conclude that the matrix E u ^ should be of quasi- 
block-diagonal form with a dominant diagonal which en- 
forces only small rotations or mixing. In the same fashion 
we can arbitrarily expand the S'g'G on the right-hand side 



LL T = L^ (I + eD + o{e 2 )) L^ 1 



(A.5) 



where LW and indicate the lower triangular matri- 

ces decomposing Sq! g and S'g'G respectively. Combining 
these two expansions we arrive at the conclusion that 



.« T (i+i)\ - r *WrW L (i+i)T (i+l) 



= S u 



(A.6) 

- c r w L« D i (l+1)T c, (4+1) l + 



"k,p 



In other words, depending on the size of the numerical 
parameter e, one may expect small angle variations be- 
tween eigenvectors of successive iteration cycles. While 
this result is formally correct, it is based on an unproven 
assumption. From the analytical point of view very lit- 
tle can be said on the validity of this expansion since each 
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cycle updates the basis functions in a very non-linear man- 
ner. We reverse the usual reasoning and by assuming 
the correctness of the expansion work our way back. In 



practice, we verify numerically the consistence of Eq. A. 6 



and consequently can infer the validity of its premises. In 
other words we follow an inverse engineering problem ap- 
proach as the basis for the computational scheme of sub- 
section 13.1.11 



References 

[1] R. M. Dreizler, and E. K. U. Gross, Density Functional Theory 

(Springer- Verlag, 1990) 
[2] W. Kohn, and L. J. Sham, Phys. Rev. A 140 (1965) 1133 
[3] P. Hohenberg and W. Kohn, Phys. Rev. B 136 (1964) 864 
[4] A. J. Freeman, H. Krakauer, M. Weinert, and E. Wimmer, Phys. 

Rev. B 24 (1981) 864. 
[5] A. J. Freeman, and H. J. F. Jansen, Phys. Rev. B 30 (1984) 561 
[6] M. Weinert, J. Math. Phys. 22 (1981) 2433 

[7] P. Blaha, K. Schwarz, G. Madsen, D. Kvasnicka and J. Luitz, 

WIEN2k - http://www.wien2k.at/ 
[8] S. Blugel, G. Bihlmayer, D. Wortmann, C. Friedrich, M. Heide, 

M. Lezaic, F. Freimuth, and M. Betzinger, The Jiilich FLEUR 

project - http://www.flapw.de 
[9] M. Weinert, R. Podloucky, J. Redinger and G. Schneider, 

FLAIR - https://pantherfile.uwm.edu/weinert/www/flair.html 
[10] C. Ambrosch-Draxl, Z. Basirat, T. Dengg, R. Golesorkhtabar, 

C. Meisenbichler, D. Nabok, W. Olovsson, P. asquale 
Pavone, S.Sagmeister, and J. Spitaler, The Exciting Code - 
http:/ / exciting-code, org/ 

[11] J. K. Dewhurst, S. Sharma, L. Nordstrom, F. Cricchio, F. Bult- 

mark, and E. K. U. Gross, The Elk Code Manual (Ver. 1.2.20) 

- http://elk.sourceforge.net/ 
[12] K. Nakamura, T. Ito, A. J. Freeman, L.Zhong, and J. F. de 

Castro, Phys. Rev. B 67 (2003) 014420 
[13] P. Kurz, F. Forster, L. Nordstrom, G. Bihlmayer, and S. Blugel, 

Phys. Rev. B 69 (2004) 024415 
[14] C. Ambrosch-Draxl, and J. O. Sofo, Comp. Phys. Comm. 174 

(2006) 14 

[15] C. Cao, P. J. Hirschfeld, and H. P. Cheng, Phys. Rev. B 77 

(2008) 220506(R) 
[16] Y. Zhou, and Y. Saad, Numer. Algor. 47 (2008) 341 
[17] K. Wu, and H. Simon, J. Mat. Anal. Appl. 22 (2000) 602 
[18] Y. Zhou, J. Comp. Phys. 229 (2010) 9188 

[19] Y. Zhou, Y. Saad, M. L. Tiago, J. R. Chelikowsky, 
J. Comp. Phys. 219 (2006) 172 

[20] E. Anderson, Z. Bai, C. Bischof, L.S. Blackford, J. Demmel, 
Jack J. Dongarra, J. Du Croz, S. Hammarling, A. Greenbaum, 
A. McKenney, and D. Sorensen, LAPACK Users' guide, (Society 
for Industrial and Applied Mathematics, Philadelphia, PA USA, 
(third ed.), 1999) 

[21] L.S. Blackford, J. Choi, A. Cleary, E. D'Azeuedo, J. Demmel, 
I. Dhillon, S. Hammarling, G. Henry, A. Petitet, K. Stanley, 

D. Walker, and R.C. Whaley, ScaLAPACK user's guide, (Soci- 
ety for Industrial and Applied Mathematics, Philadelphia, PA 
USA, 1997) 



11 



